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Speech- processing system* 



BACKGROUND OF THE INVENTION 

The invention relates to a speech- processing system, 
comprising speech-recognition means for processing a signal 
(DATA) fed from a source to a speech input of said speech- 
processing system. 

It is known that the quality of speech recognition at the 
receiving side of, e.g., a GSM link [GSM « Global System for 
Mobile communications] is currently insufficient. If the 
recogniser is located within the network, the recognition result 
on the GSM-speech signal received and decoded is partly affected 
by the amount of artificially generated noise which is added, on 
the basis of the silence detected at the transmission side, and 
the noise and disturbances received resulting from the decoded 
transmission errors on the radio path. To improve recognition, 
it is customary to collect speech material that had been 
transmitted, by way of GSM, and to use said material to develop 
new speech models, which are trained to speech signals containing 
(artificially generated) noise and distortions due to 
transmission errors, as a result of which the mismatch between 
the training situation and the recognition reality may be 
reduced. 

The known matter has the following drawbacks: due to the 
training on the received and decoded speech signals, the 
performance of the speech recogniser may only be improved 
marginally, since: 

1) decoding, e.g., encoded GSM signals is not standardised 
(only encoding is standardised) , which signifies that in 
practice there arise situations in which the speech 
recogniser is trained on a GSM speech decoder other than 
the one applied at the input of the recogniser. The error 
correction applied in the decoder, e.g., is regularly 
changed since the manufacturer has found an improved way of 
processing transmission errors (which give rise to damaged 
speech) in such a manner that a large part of said errors 
is hidden (and therefore not or hardly noticeable to the 
human ear) . This results in a mismatch arising between the 
training set on which the speech models are based and the 
actual speech. 



WO 00/72307 



2 



PCT/EP00/03738 



2) by training on speech having transmission errors, one 
admittedly already models the errors in the speech models 
(which thereby become more complex) , but there is no 
guarantee that the overall quality of the recognition 
increases, since there often applies: garbage in, garbage 
out. 

3) it is not known in advance whether a signal contains speech 
or silence (from the transmitting side) . Since 
artificially generated noise is added at the receiving side 
(comfort noise) when silences have been observed, the 
performance of the speech recognition declines, since the 
recogniser will attempt to "recognise" the noise. 

SUMMARY OF THE INVENTION 

The object of the invention is to overcome said drawbacks 
and to improve the performance of automatic speech- recognition 
systems operating at the receiving side of a speech- frame- 
oriented telephone -speech link. This may be, e.g., GSM, UMTS [= 
Universal Mobile Telecommunications System] or Voice-over IP J=» 
Intelligent Peripheral] . The core of the invention is that, at 
the receiving side, not only a speech signal is offered to the 
speech- recognition system, but also signal parameters, which give 
information on characteristics of the signal received. 

It concerns, e.g., parameters indicating the presence or 
absence of speech energy in the signal received, or the 
realiability of the signal received according to redundancy 
checks added at the transmitting side (e.g., CRCs [=* Cyclic 
Redundancy Checks] ) . 

In the event of GSM, such parameters are calculated on the 
basis of frames. Here, the parameters of interest in the 
framework of the invention are, inter alia, the BPI (= Bad Frame 
Indicator) calculated from, e.g., the CRC values per frame, and 
the SID (= silence Descriptor) derived from a parameter SP (= 
Speech Flag) . Said parameters are so far only used in GSM for 
detecting errors in the speech frames received, or for 
transmitter control (transmit only if speech is present) , as the 
case may be. 

Control of a speech recogniser by classifying parameters 
promotes the accuracy of the recognition, since the artificially 
generated noise may be ignored and defective frames may either be 
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ignored or adjusted, e.g., partially processed- Apart from the 
parameters referred to above — the BFI and the SID — use is also 
made of an encoding-mode parameter defining the significance of 
the speech-frame bits (FR £« Frame Relay], EFR Enhanced Full 
5 Rate], or the various modes under which AMR l» Adaptive Multi 

Rate] may operate). On the basis hereof, the recognition 
algorithm operative in the speech recognizer is adjusted to the 
characteristics with which the speech signal is encoded and 
decoded. 

10 It is noted that control of speech processing means by error 

parameters are known from EP0854622. The known system, however, 
aims at improving the tone quality in voice transmission and 
reproduction, while the present invention refers to the field of 
speech-recognition, which means conversion from speech (spoken 
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IS words) to text (printable characters). 
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The operation of the invention is further explained with 
reference to several figures. As an example, we take the current 
20 part of the GSM system which makes use of an Enhanced Full Rate 

{« EFR} codecl?}. The same does not apply, however, to a Full 
Q Rate (» FR) codec, nor to the (future) Adaptive Multi Rate (~ 

^ AMR) codec. FIG. 1 shows two terminals — a first, mobile 

terminal, such as a GSM handset, and a second, nonmobile 
Q 25 terminal , such as a GSM base station - which are capable of 

communicating with one another by way of a wireless medium 9. In 
the figure, there is presented only upstream communication — from 
handset to base station. 

The handset shown in the top part of FIG# 1 comprises two 
30 modules or subsystems, namely a TX/DTX Handler 1 (DTX stands for 

Discontinuous Transmission) and a TX Radio Subsystem 2, Module 1 
comprises a microphone 3, a speech encoder 4 and a Voice Activity 
Detector (- VAD> 5. Module 2 comprises a channel encoder 6, a 
SPeech-fiag monitor 7 and a transmitter 8. Signals received by 
35 the microphone 3 are fed to both the speech encoder 4 and to the 

VAD 5. 

in the vad 5, it is detected whether the microphone 3 is 
receiving speech or silence. This is encoded with a "SPeech 
flag" (=SP), which is sent along with each speech frame. In the 
40 channel encoder 6, the microphone signal encoded in encoder 4 is 
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incorrectly transmitted frame may be corrected using said 
redundant information. 

During the setup of the link, it is determined which 
encoding algorithm is used, which may be represented by the 
parameter CM (= "coding mode"). In the event of specific speech 
codecs (e.g., AMR), the "coding-mode" parameter for each frame is 
sent along, and the recogniser is dynamically driven thereby. In 
the event of other speech codecs, the parameter is transmitted to 
the receiving side only once, at the start of a session. 

Transmitter 8 thus transmits a frame-encoded signal 
containing data (the signal proper) , the parameter SP, the 
parameter CM (for specific speech codecs) and redundant 
information, as contained by the check sum CRC. 

The receiving terminal at the bottom of FIG. 1 comprises 
two modules or subsystems in a GSM base station, namely, an RX[?] 
Radio System 11 — the counterpart of module 2 of the handset, 
and an RX DTX Handler 12 — the counterpart of module 1. Module 

11 comprises a receiver 13, a channel -decoding and error- 
correcting module 14 and a parameter detector 15; the latter 
detects the presence and the value of the parameter SP sent along 
with the data signal and, if present, the parameter CM. Module 

12 comprises a speech decoder 16 and a further processing module 
17. 

The input of a speech-recognition module 20 is - 
incidentally, per se in conformity with the prior art - connected 
to the ouput of the channel decoder 14. The speech recogniser 20 
therefore processes the data signal not yet speech- decoded 
(speech) . in conformity with the present invention, the speech 
recogniser 20 is driven by one or more signal parameters, which 
are received by way of detector 15. The basis of the parameter 
SP is formed at the transmitting side in the GSM handset, 
independently from the signal contents of the data signal 
received. in the error- correcting module 14, the frames received 
are checked for correctness, prior to decoding, against the 
redundant information sent along. Incorrect frames are earmarked 
as such or, if possible, repaired (in simple cases) . Correct 
frames are passed on to the speech decoder 15. When it is not 
possible to correct a frame, module 14 gives off a BFI (= Bad 
Frame Indicator) parameter to detector module 15. According to 
the invention, said BFI is passed on, apart from to the speech 
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decoder 16, to the speech recogniser 20 as well. Upon receipt of 
said BFI, the speech recogniser 20 ignores the input offered, or 
attempts as yet to recognise that part of the frame which indeed 
may be earmarked as being correct (although the BFI has been 
set) . In other words, the value of the BFI parameter operates as 
a control parameter for the speech recogniser, as a result of 
which it processes only correct frames in one go. 

Of frames earmarked as being broken, it is attempted to use 
only that part which still continues to be correct, and frames 
earmarked as being wholly incorrect are ignored. That, in the 
event of a set BFI flag, part of the frame may still be correct, 
is caused by the bits in the speech frames being broken down into 
several classes (in GSM: 1A, IB and 2) . 

Not every class is "protected" in the same manner by adding 
redundant information. For, e.g., GSM, if bits of class 1A are 
characterised as being "damaged" {on the basis of the CRC) , the 
BFI flag is set (some manufacturers also set said flag in the 
event of damaged IB bits) . 

This need not signify, however, that all remaining bits are 
damaged as well. The recogniser takes, as its input, feature 
vectors (Rabiner & Juang, 1993) . Each speech frame is converted 
into a feature vector. The values of the undamaged part of the 
speech frame may still be offered to the recogniser. This may be 
realised, e.g., by giving the corrupted features in the feature 
vectors one specific value which results in a nil effect on the 
score of the signal received (De Veth, Cranen & Boves, 1998), or 
by ignoring the entire frame (Lippman & Carlson, 1997) . in 
approximately the same way, the SID parameter affects the speech 
recogniser 20. The SID parameter is derived from the value of 
the SPeech flag as given off by the Voice Activity Detector 5 and 
transmitted by transmitter 8. In the event of speech, the SP 
receives a specific value, as well as the SID; should speech be 
lacking (silence) , the SP and thereby the SID parameter will 
receive another value. The result is that the speech recogniser 
is enabled in the event of the transfer of a real speech signal 
and disabled in the event of the absence of speech. Finally, as 
indicated above, it is possible to set the operation of the 
speech recogniser 20 as a function of the encoding algorithm of 
the speech encoder 4 (e.g., FR, EFR, AMR etc.). In the figure, 
such is done by the parameter CM determined by way of the 
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handshake (and therefore during the setup of the link) , or sent 
along with each speech frame. 
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CLAIMS 

1. Speech-processing system, comprising speech- recognition 
means (20) for processing a signal entered from a source (1, 2) 
to a speech input (DATA) , characterised by means for affecting 
the operation of the speech- recognition means by one or more 
control parameters (CM, SID, BFI) entered by way of a control 
input, each control parameter relating to a specific 
characteristic of the signal entered from the source to the 
speech- recognition means (DATA) . 

2. Speech-processing system according to claim 1, 
characterised in that a first control parameter (BFI) relates to 
the reliability or correctness of the signal entered and that the 
operation of the speech- recognition means (20) is adjusted to the 
reliability or correctness, as the case may be, indicated by said 
first control parameter, of the signal entered. 

3. Speech-processing system according to claim 1, 
characterised in that a second control parameter (SID) relates to 
the speech/noise ratio and that the operation of the speech- 
recognition means (20) is adjusted to the speech/noise ratio of 
the signal entered indicated by said second control parameter. 

4. Speech-processing system according to claim 1, the signal 
entered to the speech- recognition means (20) being encoded in 
speech-encoding means (4) at the source, characterised in that a 
third control parameter (CM) relates to the speech- encoding mode 
in the speech -encoding means, the operation of the speech- 
recognition means (20) being adjusted to the speech- encoding mode 
indicated by said third control parameter. 

5. Telecommunications system, comprising a first terminal (1, 
2) having speech- and channel -encoding means (4, 6), a 
transmission medium (9) and a second terminal (11, 12) having 
channel- and speech- decoding means (13 , 16) and a speech- 
processing system according to claim 1, said signal (DATA) being 
offered from the first terminal, by way of the transmission 
medium, to the speech input of the speech recogniser of the 
second terminal, and each control parameter (CM, SID, BFI) being 
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offered by the first terminal, by way of the transmission medium, 
to the control input intended for that purpose of the speech- 
processing system of the second terminal. 
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country 



The Netherlands 



country 
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Third inventor: 



,HUI$HAN , .Victor Casper Alexan^r 

last first middle 

W rightlaan 11 
Street 

22 8 9 JKJ RIJSWIJK f<JD( The Netherlands 
city, state, zip code country 

P.O. Box 95321 

post office & box number 

250 9 CH The Hague The Netherlands 

city, state, zip code country 




Full name: 




.Residence address : 



Post Office address: 
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Fourth inventor: 



Full name: 



Residence address : 



ROVF.S 



last 

Rembrandt straat 18 



Lodewijk Willem Johan 
first middle 



Street 



;: 

l r 1 s 



6521 ME NIJMEGEN /JlX The Netherlands 
city, state, zip code country 



Post Office address: P.O. Box 95321 



post office & box number 

2509 CH The Hague The Netherlands 



Citizenship : 



city, state, zip code 

The Netherlands 
Country 



country 



Signature : 
Date: // ^ 
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Power of attorney: 



As a named inventor, I hereby appoint: 

Peter L. Michaelson (Reg. No. Jill, 0%Q4- 
Robert M. Wallace (Reg. No. 29,119) 
Jeremiah G. Murray (Reg. No. 20,533) 
John T. Peoples (Reg. No. 28',250) 
Ronald L. Drumheller (Reg. No. 25, 674) 

Edward M. Fink (Reg. No. 19, 64 0.) 

Christopher Balzan (Reg. No. 40, 901) 
Eric Agaard (Reg. No. 4 0,478) 

as my attorneys to prosecute this application and to transact all business in 
the United States Patent and Trademark Office in connection therewith. 

Direct all correspondence to Customer Number 0072 65 at the following address: 

J MICHAELSON & WALLACE 

5f * Parkway iub> office Center 

Np>wm^n Springs RoacI 

M P.O. R^v P4PQ " * ' 

•Q Red Bank, New Jersey 07701. 



-ST » 

IS 



Direct all telephone calls to: (732) 530-6671 . 

I hereby declare that all statements made herein of my own knowledge are true 
and that all statements made on information and belief are believed to be 
true; and further that these statements were made with the knowledge that 
willful false statements and the like so made are punishable by fine or 
imprisonment, or both, under Section 1001 of Title 18 of the United States 
Code and that such willful false statements may jeopardize the validity of the 
application or any patent issued thereon. 
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